17 research outputs found

    Towards an Atlas of Computational Learning Theory

    Get PDF
    A major part of our knowledge about Computational Learning stems from comparisons of the learning power of different learning criteria. These comparisons inform about trade-offs between learning restrictions and, more generally, learning settings; furthermore, they inform about what restrictions can be observed without losing learning power. With this paper we propose that one main focus of future research in Computational Learning should be on a structured approach to determine the relations of different learning criteria. In particular, we propose that, for small sets of learning criteria, all pairwise relations should be determined; these relations can then be easily depicted as a map, a diagram detailing the relations. Once we have maps for many relevant sets of learning criteria, the collection of these maps is an Atlas of Computational Learning Theory, informing at a glance about the landscape of computational learning just as a geographical atlas informs about the earth. In this paper we work toward this goal by providing three example maps, one pertaining to partially set-driven learning, and two pertaining to strongly monotone learning. These maps can serve as blueprints for future maps of similar base structure

    The Minimization of Random Hypergraphs

    Get PDF
    We investigate the maximum-entropy model B_{n,m,p} for random n-vertex, m-edge multi-hypergraphs with expected edge size pn. We show that the expected size of the minimization min(B_{n,m,p}), i.e., the number of inclusion-wise minimal edges of B_{n,m,p}, undergoes a phase transition with respect to m. If m is at most 1/(1-p)^{(1-p)n}, then E[|min(B_{n,m,p})|] is of order ?(m), while for m ? 1/(1-p)^{(1-p+?)n} for any ? > 0, it is ?(2^{(H(?) + (1-?) log? p) n}/?n). Here, H denotes the binary entropy function and ? = - (log_{1-p} m)/n. The result implies that the maximum expected number of minimal edges over all m is ?((1+p)?/?n). Our structural findings have algorithmic implications for minimizing an input hypergraph. This has applications in the profiling of relational databases as well as for the Orthogonal Vectors problem studied in fine-grained complexity. We make several technical contributions that are of independent interest in probability. First, we improve the Chernoff-Hoeffding theorem on the tail of the binomial distribution. In detail, we show that for a binomial variable Y ? Bin(n,p) and any 0 < x < p, it holds that P[Y ? xn] = ?(2^{-D(x?p) n}/?n), where D is the binary Kullback-Leibler divergence between Bernoulli distributions. We give explicit upper and lower bounds on the constants hidden in the big-O notation that hold for all n. Secondly, we establish the fact that the probability of a set of cardinality i being minimal after m i.i.d. maximum-entropy trials exhibits a sharp threshold behavior at i^* = n + log_{1-p} m

    The Parameterized Complexity of Dependency Detection in Relational Databases

    Get PDF
    We study the parameterized complexity of classical problems that arise in the profiling of relational data. Namely, we characterize the complexity of detecting unique column combinations (candidate keys), functional dependencies, and inclusion dependencies with the solution size as parameter. While the discovery of uniques and functional dependencies, respectively, turns out to be W[2]-complete, the detection of inclusion dependencies is one of the first natural problems proven to be complete for the class W[3]. As a side effect, our reductions give insights into the complexity of enumerating all minimal unique column combinations or functional dependencies

    Space-Efficient Fault-Tolerant Diameter Oracles

    Get PDF
    We design ff-edge fault-tolerant diameter oracles (ff-FDOs). We preprocess a given graph GG on nn vertices and mm edges, and a positive integer ff, to construct a data structure that, when queried with a set FF of Ff|F| \leq f edges, returns the diameter of GFG-F. For a single failure (f=1f=1) in an unweighted directed graph of diameter DD, there exists an approximate FDO by Henzinger et al. [ITCS 2017] with stretch (1+ε)(1+\varepsilon), constant query time, space O(m)O(m), and a combinatorial preprocessing time of O~(mn+n1.5Dm/ε)\widetilde{O}(mn + n^{1.5} \sqrt{Dm/\varepsilon}).We present an FDO for directed graphs with the same stretch, query time, and space. It has a preprocessing time of O~(mn+n2/ε)\widetilde{O}(mn + n^2/\varepsilon). The preprocessing time nearly matches a conditional lower bound for combinatorial algorithms, also by Henzinger et al. With fast matrix multiplication, we achieve a preprocessing time of O~(n2.5794+n2/ε)\widetilde{O}(n^{2.5794} + n^2/\varepsilon). We further prove an information-theoretic lower bound showing that any FDO with stretch better than 3/23/2 requires Ω(m)\Omega(m) bits of space. For multiple failures (f>1f>1) in undirected graphs with non-negative edge weights, we give an ff-FDO with stretch (f+2)(f+2), query time O(f2log2n)O(f^2\log^2{n}), O~(fn)\widetilde{O}(fn) space, and preprocessing time O~(fm)\widetilde{O}(fm). We complement this with a lower bound excluding any finite stretch in o(fn)o(fn) space. We show that for unweighted graphs with polylogarithmic diameter and up to f=o(logn/loglogn)f = o(\log n/ \log\log n) failures, one can swap approximation for query time and space. We present an exact combinatorial ff-FDO with preprocessing time mn1+o(1)mn^{1+o(1)}, query time no(1)n^{o(1)}, and space n2+o(1)n^{2+o(1)}. When using fast matrix multiplication instead, the preprocessing time can be improved to nω+o(1)n^{\omega+o(1)}, where ω<2.373\omega < 2.373 is the matrix multiplication exponent.Comment: Full version of a paper to appear at MFCS'21. Abstract shortened to meet ArXiv requirement

    Near-Optimal Deterministic Single-Source Distance Sensitivity Oracles

    Get PDF
    Given a graph with a source vertex ss, the Single Source Replacement Paths (SSRP) problem is to compute, for every vertex tt and edge ee, the length d(s,t,e)d(s,t,e) of a shortest path from ss to tt that avoids ee. A Single-Source Distance Sensitivity Oracle (Single-Source DSO) is a data structure that answers queries of the form (t,e)(t,e) by returning the distance d(s,t,e)d(s,t,e). We show how to deterministically compress the output of the SSRP problem on nn-vertex, mm-edge graphs with integer edge weights in the range [1,M][1,M] into a Single-Source DSO of size O(M1/2n3/2)O(M^{1/2}n^{3/2}) with query time O~(1)\widetilde{O}(1). The space requirement is optimal (up to the word size) and our techniques can also handle vertex failures. Chechik and Cohen [SODA 2019] presented a combinatorial, randomized O~(mn+n2)\widetilde{O}(m\sqrt{n}+n^2) time SSRP algorithm for undirected and unweighted graphs. Grandoni and Vassilevska Williams [FOCS 2012, TALG 2020] gave an algebraic, randomized O~(Mnω)\widetilde{O}(Mn^\omega) time SSRP algorithm for graphs with integer edge weights in the range [1,M][1,M], where ω<2.373\omega<2.373 is the matrix multiplication exponent. We derandomize both algorithms for undirected graphs in the same asymptotic running time and apply our compression to obtain deterministic Single-Source DSOs. The O~(mn+n2)\widetilde{O}(m\sqrt{n}+n^2) and O~(Mnω)\widetilde{O}(Mn^\omega) preprocessing times are polynomial improvements over previous o(n2)o(n^2)-space oracles. On sparse graphs with m=O(n5/4ε/M7/4)m=O(n^{5/4-\varepsilon}/M^{7/4}) edges, for any constant ε>0\varepsilon > 0, we reduce the preprocessing to randomized O~(M7/8m1/2n11/8)=O(n2ε/2)\widetilde{O}(M^{7/8}m^{1/2}n^{11/8})=O(n^{2-\varepsilon/2}) time. This is the first truly subquadratic time algorithm for building Single-Source DSOs on sparse graphs.Comment: Full version of a paper to appear at ESA 2021. Abstract shortened to meet ArXiv requirement

    Fair Correlation Clustering in Forests

    Get PDF
    The study of algorithmic fairness received growing attention recently. This stems from the awareness that bias in the input data for machine learning systems may result in discriminatory outputs. For clustering tasks, one of the most central notions of fairness is the formalization by Chierichetti, Kumar, Lattanzi, and Vassilvitskii [NeurIPS 2017]. A clustering is said to be fair, if each cluster has the same distribution of manifestations of a sensitive attribute as the whole input set. This is motivated by various applications where the objects to be clustered have sensitive attributes that should not be over- or underrepresented. Most research on this version of fair clustering has focused on centriod-based objectives. In contrast, we discuss the applicability of this fairness notion to Correlation Clustering. The existing literature on the resulting Fair Correlation Clustering problem either presents approximation algorithms with poor approximation guarantees or severely limits the possible distributions of the sensitive attribute (often only two manifestations with a 1:1 ratio are considered). Our goal is to understand if there is hope for better results in between these two extremes. To this end, we consider restricted graph classes which allow us to characterize the distributions of sensitive attributes for which this form of fairness is tractable from a complexity point of view. While existing work on Fair Correlation Clustering gives approximation algorithms, we focus on exact solutions and investigate whether there are efficiently solvable instances. The unfair version of Correlation Clustering is trivial on forests, but adding fairness creates a surprisingly rich picture of complexities. We give an overview of the distributions and types of forests where Fair Correlation Clustering turns from tractable to intractable. As the most surprising insight, we consider the fact that the cause of the hardness of Fair Correlation Clustering is not the strictness of the fairness condition. We lift most of our results to also hold for the relaxed version of the fairness condition. Instead, the source of hardness seems to be the distribution of the sensitive attribute. On the positive side, we identify some reasonable distributions that are indeed tractable. While this tractability is only shown for forests, it may open an avenue to design reasonable approximations for larger graph classes

    Fault-Tolerant ST-Diameter Oracles

    Get PDF
    We study the problem of estimating the ST-diameter of a graph that is subject to a bounded number of edge failures. An f-edge fault-tolerant ST-diameter oracle (f-FDO-ST) is a data structure that preprocesses a given graph G, two sets of vertices S,T, and positive integer f. When queried with a set F of at most f edges, the oracle returns an estimate D? of the ST-diameter diam(G-F,S,T), the maximum distance between vertices in S and T in G-F. The oracle has stretch ? ? 1 if diam(G-F,S,T) ? D? ? ? diam(G-F,S,T). If S and T both contain all vertices, the data structure is called an f-edge fault-tolerant diameter oracle (f-FDO). An f-edge fault-tolerant distance sensitivity oracles (f-DSO) estimates the pairwise graph distances under up to f failures. We design new f-FDOs and f-FDO-STs by reducing their construction to that of all-pairs and single-source f-DSOs. We obtain several new tradeoffs between the size of the data structure, stretch guarantee, query and preprocessing times for diameter oracles by combining our black-box reductions with known results from the literature. We also provide an information-theoretic lower bound on the space requirement of approximate f-FDOs. We show that there exists a family of graphs for which any f-FDO with sensitivity f ? 2 and stretch less than 5/3 requires ?(n^{3/2}) bits of space, regardless of the query time
    corecore